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ABSTRACT 

A study investigated factors in design of oral 
language comprehension tests and their relevance in determining 
actual iarxguage proficiency, focusing on the tests* decoding 
requirements. Task~based, interactive oral language tests were 
designed: (1) to elicit information about integration of first (LI) 
and second languages (L2) in the learner's language processing; and 
(2) to avoid assessment of language skills in isolation. The 
resulting tests were piloted, and information was gathered on overall 
task fulfillment and on different aspects of the tasks. One test 
module, in which the task involved researching an issue and reporting 
results, illustrates the technique. The examinee's report and 
interview concerning the research were recorded, and the examinee 
then wrote a report in LI. The report contains five sections each 
graded separately on the extent to which the candidate was able to 
identify and integrate information needed to fulfill the task. 
Analysis of test performance suggests that conventional oral scales 
concerned with surface features of language are not adequate 
predictors of ability to comprehend in oral interactions. In 
addition, in this test it appeared that comprehension skills were 
closely related to other complex and integrative skills such as 
ability to write a coherent report in LI. (MSE) 
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ARE YOU DECODING ME? THE ASSESSMENT OF U.MDER- 
STANDING IN ORAL INTERACTION 

JiU Schrafnagl and Duncan Cameron 

Institute of Linguists 

This paper addresses the question of the comprei»e:ision of spoken discourse 
and its role in language proficiency and proficiency assessment. It will draw 
upon various aspects of our work in this area, including data from our recent 
pilot testing. Our main contention will be that, although comprehension skills 
a.e a sine qua non in spoken communication, they cannot be assessed directly 
as many language tests, including the US FSI interview and most FL oral tests 
used in Britain, claim to do. A central concern in our work on L2 proficiency is 
whether the significance of what is said or written has been recovered by die 
hearer of reader, or what significance she has imposed on the discourse. V/e 
shall be discussing the problems of assessing proficiency in the decoding of 
spoken language, looking first at how this is currendy approached in second lan- 
guage testing. Subsequently, we have a few comments to make on the subjects 
of language proficiency itself, on proficiency and performance testing, and on 
comprehension during oral interaction. 

Finally, we shall outline our work on the design and development of tests of 
second language proficiency and in particular their decoding requirements and 
look at son. of the data from pilot tests involving L2 oral interactive perfor- 
mance. 

The aim of the project is ;o develop examinations of foreign or second lan- 
guage proficiency at levels from beginners up to first degree-equivalence for 
use in the 1990 and beyond. The syllabus should be applicable to all languages 
likely to be in demand - currently, the Institute of Linguists rur- examinations 
in some 40 languages at lower levels, dropping to 20 at the highest level. 

A syllabus for all-comers examinations which set out to test proficiency in 
using a foreign or second language for communicative purpose, whether the 
user's ends are social, professional, or both, must be sensitive to the heteroge- 
neity of language learners and users and the multiplicity of their goals. The types 
of proficiency tested need to be derived not from L2 classroom language use 
but from how languages are used naturally, by native speakers and bilinguals at 
any stage in their learning and in personal or professional domains. (It is not 
possible in this paper to discuss the practice of applying an ideal native speaker 
model, a monolingual model generaUy, to the biUngual. We do have consider- 
able reservations about this, and about the convergence implications it has and 
we have taken a rather different point of departure in our assessment criteria). 

The tests we have designed are lask-based. The ta.sks themselves are based 
in v/hat people need to be able to do when called on to use a second language 
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Are You Decoding Me? 

in addition to their first, or generally preferred language, in contexts where com- 
munication, and not learning, is the primary objective. 

The patterns of L2 use which emerge from investigations of so-caUed real- 
life language use in contexi are complex. In performance, a learner's two (or 
more) languages are integrated in a variety of ways. The second language user 
also typically lives out his/her role as a bilingual, mediating between cultures 
and the members of different speech communities. This roie as mediator and fa- 
cilitator of cross-cultural, cross-lingual communication is particularly impon- 
ant at higher levels of proficiency, where learners are increasingly able to, and 
motivated to, apply their L2 competence not only in personal but also in pro- 
fessional contexts. These features of second language use have hitherto largely 
been ignored in testing, even in what is declared to be proficiency testing. The 
conventional approach is to assess L2 proficiency monolingually . the only usual 
exception being tests of translation or interpreting. 

A further tendency is to assess skills in isolation, and this too has very little 
to do with natural language use. The skills and subskiUs are typicaUy integrated 
in many different configurations in natural language use, including second lan- 
guage. It only makes sense to de-integrate, or segregate, skills when testing for 
particular experimental and diagnostic purposes. Where teaching programmes 
are based on a contrived separation of skills the (achievement) testing that fol- 
lows wiU tend to take the same approach; the arguments for this are circular and 
generally not predicated on any construct of language competence, any theory 
or descriptive framework for language use or any model of language profi- 
ciency. 

The teniis of our research and development brief require us to design, pilot, 
modify imd retrial proficiency tests and assessment systems in a range of lan- 
guages and have the new examinations ready to go mto prcxluction by 1989/V(). 
We have thus had to combine experimental and pilot phases of testing, and are 
about to enter the third and final phase of piloting. The material we have piloted 
to date consists of integrated task-based tests involving French, Spanish, Italian 
and currendy Chinese and English. We shall be concentraung in our presenta- 
tion of data on examples where the oral/aural interactional comprehension skills 
are an essential component of the language proficiency performance being 
tested. 

First, however, it may be helpful to consider how ttie comprehension of 
spoken discourse is currently approached elsewhere in second language testing. 
It is frequently assumed that the sort of examination task that is conventionally 
known as "listening comprehension'* is a valid indicator of a learner's ability to 
decode language in its aural form. It is widely recognized, of course, that the 
assessment of decoding skills is ft^ught with difficulties; it may still, neverthe- 
less, be worthwhile to look in more detail at some these. 
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Are You Decoding Me? 

One basic objection is a very familiar one - tiiat the "listening comprehen- 
sion'* exercise has to elicit some sort of observable behaviour in relation to a 
process which, in natural language use, need not have any visible or observable 
manifestations. So failure to perform adequately on such tasks may be a result 
of the nature of the behaviour eliciting task, and not of a deficiency in the aural 
decoding skill. There are, however, further objections. Aural decoding in a natu- 
ral context takes place typically withineachdecoder's mental frameworks; these 
allow the decoder to predict the drift of a message, to block or ignr re irrelevant 
or uninteresting messages, and select only those bits of message that are rele- 
vant for whatever purpose the decoder has at just that moment when he is lis- 
tening. The skills required by the same decoder when he is participating in a 
"listening comprehension" exercise without these comfortable props, are rather 
different; failure to decode a "listening comprehension'* exercise cannot, then 
be taken as evidence of an inability to decode in contexts of natural language 
use. There is a further serious objection to the interpretation of "listening com- 
prehension" exercises as valid instruments for assessing aural decoding skills. 
What the classic listening comprehension exercise by its very nature is unable 
to simulate is that in natural, non-examination, language use, aural messages 
wliich are to be decoded are typically negotiable. The listener - or "receiver of 
the message" - in face-to-face interaction, is able to encourage the supplier of 
aie message to adjust, rephrase or recapitulate the message until he or she is 
confident of having received and understood just that information that the re- 
ceiver wants. 

(For the purpose of this paper the word interactive will be used to describe 
this sort of listening, realizing that the word has been used in a rather different 
sense in relation to reading and other listening processes). 

The conventional lisrening compreheasion test is a test of non-interactive 
listening skills. While for many, learners and native speakers alike, an import- 
ant non-interactive listening scenario is "entertainment" listening - listening to 
radio, television or plays - non-interactive listening is a more narrowly -based 
skill in many professional and vocational uses of languages, being restricted to 
listening to presentations and lectures in a work or learning context. A profi- 
ciency examination could not claim that a wide enough range of listening profi- 
ciency skills can be assessed solely by means of a non-interactive listening 
exercise. 

It could be claimed that oral examinations are designed to assess interactive 
listening skills. Unfortunately, many oral examinations are designed typically 
to elicit a sample of language and then to assess it by means of various 
fluency/accuracy scales. A different type of oral examination can be devised in 
which the candidate 'controls' Jie interaction and negotiates with tlie interlocu- 
tor to elicit a message in a fomi consonant with his or her ability to deco<le it. 
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Tlic relationship between interactants in which one holds information and the 
other has to apply appropriate interactional strategies to access the information 
can be simulated in the examination context by tasks in which candidates have 
to work within an information or opinion collecting context defined by a brief. 
The conventional interview examination is typically assessed by means of rating 
scales that concern themselves with such features as quality of language, pro- 
nunciation, fluency, and so on - features derived from an analysis of the suriface 
features of language, and not in any way related to the decoding processes that 
are taking place while the interaction develops. 

We need to understand a great deal more than we do know about how ^)eople 
interact to negotiate meaning and the nature of the processing routines involved, 
about what makes a good interactant and good listener. 

In oral L2 testing, such phenomena as pauses, misalignments, signals of 
comprehension monitoring and con firm ationelicitalion are too easily dismissed 
asdysfluencier maikingoff the L2 speaker's shortfall from the idealized native- 
speaker behaviour; but they could equally well be efncient and effective inter- 
actional strategies employed to get the interlocutor to modify his/her input. 

So we need to know more about the way interaction is modified in different 
situational contexts, particularly, how native speakers and non-native speakers 
interact. The body of research into interactional competence has focused large- 
ly on conversation analysis of LI speakers, and we should be cautious about 
generalizing from this to bilingual and L2 contexts. 

Comprehension studies also tend to focus on communication breakdown 
:uid repair mechanisms. It is not by any meaas always apparent in oral interac- 
tions when or how that breakdown has occurred, particularly not in a test situ- 
ation. 

It must be remembered that wc were not concerned with developmental ac- 
quisitional issues except very indirectly, in that wc are sampling proficiency at 
different levels. We were seeking to design proficiency tests which would 
sample performance on tasks simulating major features of those encountered in 
the use of one*s two languar es outside the classroom. 

The resulting test design at the highest levels, which are very approxmiate- 
ly equivalent to A-level and first degree level, is modular. Each module requires 
testees to perform complex integrated tasks. The modules differ in the area of 
language use to which L2 proficiency is applied - for example, political and so- 
cial research, international journalism, media monitoring, negotiation and rep- 
resentation, information services, transactions via the telephone or 
correspondence. Consequently, the modules differ, too, in the range of texts and 
tasks which are encountered; they require varying arrays of skills and subskiDs; 
and differing roles arc assumed by the L2 and LI in every case. For the exam- 
inations themselves, candidates will select from the range of modules those 
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which best reflect their interests, goals and perceived needs and aliempl any mo- 
dule whenever they feel ready. Thus, examinaiions are not only focused on lan- 
guage use, ihey are also centred on the user. 

In sampling performance on realistic occupationally-oriented tasks, these 
tests are in some ways similar to language performance testing as developed, 
for instance, by Marjorie Wesche and her colleagues in the Oniario Test of Eng- 
lish as a Second Language. There are also major differences; where OTESL 
uses banding of skills in a number of separate scales, performance in our tests 
is assessed entirely on task fulfilment criteria (the "driving test" approach). 

During the piloting of tests we have not only been assessing overall task ful- 
filment but have been collecting data on different aspects of the tasks. The data 
produced on one particular module form the basis of the following discussion. 
The module is degree-level, and the task is one of researching an issue using a 
variety of sources and reporting the results. 

In the pilot material from which the data are taken, the LI was English and 
the L2 was German. The task specification required the testee to research and 
report on the situation of the migrant worker, or Casta rbeiter, in the Federal Re- 
public of Germany, specifically the policies on assimilation, integration, repa- 
triation and recent changes in approaches to the education of the children of 
migrant woricers. The putative end-user of the report is a British MEP who needs 
to be briefed for his/her work on a minorities committee. The detailed task remit 
specified the a.spects on which infonnation had to be obtained and evaluated. 
Two types of source were available : a dossier of printed materials including sur- 
veys, editorials, excerpts from government reports and statistics, and an inter- 
view with a German r,presentative of a nongovernmental minority rights 
commission. The intePr'iewee would, if asked, be able to update and flesh out 
the information from the published materials. The interview subtask thus incor- 
porated an information gap situation with pre-allocated roles. Tlie oral input was 
contextualised by both the task specification and the dossier of materials. The 
input information is held constant by also giving the interviewee a detiiiled spe- 
cification and by training interiocutors, to ensure consistency of response to the 
testee*s questions, the interviewer - the candidate in other words - has an active 
planning role» has to produce comprehensible output and manage the interac- 
tion so that he/she elicits comprehensible inputs and has constantly to check 
his/her intake and update his/her mental representation. At the same time, the 
interviewer must record the relevant information, using whatever aids and 
strategies preferred: note-taking in the L2, extempore translation and notes in 
the LI , ticking against some checklist prepared along with the interview prompt 
notes, and so on. Strategies are, of course, not prescribed. 

The interviewee will not always stick lo the point after all it is the inter- 
viewer's point, that is, it is the candidate who is pursuing a particular objective. 
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So the interviewee may - is in fact briefed to - introduce opinion (flag>^.ed as 
such) lUong with fact. The candidate has to be able to distinguish reported fact 
from opinion. The results of the research are then written up in a report in the 
LI . This is a cognitively demanding task, as are all the other [nodules and tasks 
at this level. The assessment of task fulfilment looks solely at the evidence in 
the report. We are concerned with the different factors in ta.sk performance only 
in so far as they contribute to the required outcome. In our pilot tests, however, 
we are jilso investigating relationships between perfomiance on subtask.s and 
overall task fulfilment. We are after all, assuming with our test design that suc- 
cess in the outcome is only attainable if the constituent processes are success- 
ful. We are also, less explicitly but no less importantly, claiming that 
convenbonal approaches to the assessment of L2 proficiency on separate accu- 
racy and fluency scales for the four macroskills are adequate to establish the 
kind of pcrfonnance proficiency that will get you and keep you a demanding 
job using languages. Specifically, the structuraMexical-phonological-Quency 
scales used in so many tests of second language proficiency reflect a totally in- 
adequate model of interactional language competence. While the exiiniination 
wiLS being administered the native-speaker German interview interlocutor did 
not assess the candidates' performance in any way, restricting l)erself to the role 
of infonuation supplier as detailed in the task brief. Each oral interacdon was 
recorded on tape-recorder. 

Following die interview, in >vliich candidates sought to supplement itifor- 
mation gleaned from the written dossier, reports were written in English (that 
is, the learner's LI) according to the task brief, fleshing out information from 
different sources. It is important to realise that the carrying out of the task brief 
depended upon this process of synthesis. It v. as impossible, in other words, in 
an adequate report to mark the success of the report according to a check -list of 
essctilial points who.se origin (from either text or interview) could be clearly 
identified. 

The report contamed five sectioas; each section wa^ awarded a nuixiinum 
of four points. Each omission or misrepresentation of mfomiatioti in iUiy .sec- 
tion led to a subtraction of one point, giving a range in each secuon from zero 
to four. This measure on a scale of 0 - 20 (Fig. 1 column A) indicates the extent 
to which each candidate was able to identify and put together the mfoniiation 
needed to fulfill the task brief. 

Scale B in Fig. 1 is concerned more directly with the success or failure of the 
interview - measuring success or failure as the ability of the candidateAmer- 
viewcr to extract tlie necessary information from Uie interviewee/interlocutor - 
in other words, to fulfill the terms of the task brief as it applied to the interview 

Tliere were .seven different topic iireiis where infomiaiion from the written 
sources needed to be complemented by additional or rclativiiung infonuation 
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which could be obtained only from the interview. Each report wjls given a 0 - 7 
rating depending on whether the interview information in each of these areas 
had been accurately arKi fully transmitted. 

It was found that there was a very close association between scales A and 
B. All candidates who gained 7 point maximum on scale B gained a high score 
on scale A (candidates 2,4,5,9,1 1,13,19). 

Only one high scorer on scale A gained less than " on scale B (candidate 
15). 

It could be argued that the close relationship between success or failure on 
these two scales was caused by the overlap between them - both are arrived at 
by means of an analysis of the same product - the candidates' final report. 

Information measured by scale B is also necessary for success on scale A. 

Reports werti assessed according to a third measure, which would appear, 
on the face of it, to be concerned with quite different skills from those measured 
by Scales A and B. Reports were given a score out of nine (see scale C) accord- 
ing simply to their effectiveness as reports^ without reference to the information 
appearing in them. Four points maximum were awarded for the coherence of 
each report, taking into account logical presentation and use of sources as exem- 
plification. A maximum of 4 points were given for appropriacy of register and 
clarity of linguistic presentation. One bonus point was awarded for readability. 
Perhaps suTprisingly, success or failure on this largely LI skill is just as close- 
ly related to each of the previous two scales as scales A and B are related to each 
other. Five of the highest scores on this scale C (2,4,5,9.1 1 ) were the highest 
scorers on each of the other two scales; only one high scorer on scale C (15) 
was not one of the highest in scales B, 

A fourth scale - Scale D - was concerned with conventional quality of Ian- 
^uage features in the candidates* spoken language. Marks were awarded by a 
German native speaker listening to a tape recording of each interaction. The six- 
teen-point scale coasisted of four categories of four points each: 

(i) accuracy and range of stnjcture 

(ii) accuracy of phonology, stress and intonation 

(iii) range and accuracy of lexis 

(iv) accuracy 

Points from zero to four were awarded impressionistically in each category. 
High scores on this scale do not have a close relationship with scores on the 
other scales - for example, candidate 17 got a high score on scale D, but low 
scores on the other three scales, while candidate 4, with the same score on the 
oral scale was also one of the high scores in other scales. 

Spearman-Brown Rank-Difference correlation coefficients were calculrited 
for all of the cross-correlations between the fourscales. The resuhs(Fig.2) bear 
out the observations made during the preceding discussion. Cross-correlations 
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between rr " res A.B. and C are all very high and very comfonablv above the 
01 signitk • je level tor 19 c;ises. In striking contrast, cross-correlations be- 
tween measures D - concerned with surface features of L2 language - and the 
measures derived from LI report are very low» and do not approach signific- 
iinco level. 

It would appeiir, then, that conventional oral scales concerned with surface 
features of language arc not adequate predictors of ability to comprehend in oral 
interactions. What is perhaps surprising is that our results indicate that com- 
prehension skills of the type we were concerned with are closely related to other 
complex and integrative skills - such as the ability to write a coherent report in 
the LI. 
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Score and Rank order data on four assessement scales 
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